An Automatic Procedure for Generating Datasets for Conversational Recommender Systems
نویسندگان
چکیده
Conversational Recommender Systems assist online users in their information-seeking and decision making tasks by supporting an interactive process with the aim of finding the most appealing items according to the user preferences. Unfortunately, collecting dialogues data to train these systems can be labour-intensive, especially for data-hungry Deep Learning models. Therefore, we propose an automatic procedure able to generate plausible dialogues from recommender systems datasets. People have information needs of varying complexity, which can be solved by an intelligent agent able to answer questions formulated in a proper way, eventually considering user context and preferences. Conversational Recommender Systems (CRS) assist online users in their information-seeking and decision making tasks by supporting an interactive process [1] with the aim of finding the most appealing items according to the user preferences. Unfortunately, collecting dialogues data required for the training phase of these systems can be really labour-intensive, especially for the latest data-hungry Deep Learning models. For this reason, synthetic dialogue datasets can be extremely useful in order to bootstrap effective dialogue systems able to support a goal-oriented conversation with the user. Therefore, we propose an automatic procedure able to generate plausible dialogues directly from well-known recommender systems datasets exploiting data coming from the Linked Open Data Cloud and contextual information related to the user. Given a user u and his/her set of binary preferences, we trained a decision tree from the user u preferences expressed towards items represented using Linked Open Data binary features extracted from the Wikidata knowledge base. In particular, each pair predicate-object is represented as a binary feature which is 1 if The dialogue datasets generated from MovieLens 1M and MovieTweetings datasets can be found at: http://github.com/swapUniba/ConvRecSysDataset. The source code of the automatic procedure for generating conversational recommender systems datasets will be released when the paper will be accepted. 3 http://www.wikidata.org the knowledge base contains the triple (item, predicate, object), 0 otherwise. The considered predicates are wdt:P57 (director), wdt:P161 (cast member), wdt:P136 (genre) . The dialogue generation procedure is an iterative algorithm which is executed until all user preferences have been used. At each step of the dialog generation procedure, a top-n list of items composed by positive and negative items is generated by randomly choosing from positive and negative preferences of the given user u. Then, paths from the root of the decision tree to the consistently classified examples are exploited to generate a sequence of questions, randomly chosen according to a binomial distribution over the item features, to elicit user preferences. Depending from the percentage of positive items in the top-n, a “refine” step is triggered which extends the dialog with additional questions that lead to a list of suggestions which contains only positive items. Table 1 shows a conversation generated by applying the designed procedure to the well-known MovieLens 1M recommender systems dataset. In the first part of the conversation, utterances with the aim of introducing the user are generated by exploiting the contextual information included in the dataset. Agent Utterance user Hey bot What is your name? user I am Sofia bot How old are you? user I am 25 bot What is your occupation? user sales/marketing bot Which are your favourite movies? user I love Speed bot I suggest you Erin Brockovich Witness Pocahontas Four Weddings and a Funeral Anna and the King. Do you like them? user I hate them bot What are your favourite directors? user I like John Waters bot What are your favourite actors? user I like Josef Sommer Wade Williams Marg Helgenberger Jeroen Krabb bot What genres do you like? user My favourite genres are teen film romance film biographical film bot I suggest you Erin Brockovich Witness Ever After Simply Irresistible Hairspray. Do you like them? user I like them bot I am glad that you like them Table 1. Conversation generated from the MovieLens 1M dataset. To help reading, Wikidata ids have been replaced with the corrisponding entities. In this work we have proposed an automatic procedure able to generate synthetic dialogue datasets starting from well-known datasets in the recommender system field. The presented procedure is completely generic and can be applied on any dataset containing binary user preferences and whose items have a corresponding identifier in the Linked Open Data Cloud.
منابع مشابه
A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملA social recommender system based on matrix factorization considering dynamics of user preferences
With the expansion of social networks, the use of recommender systems in these networks has attracted considerable attention. Recommender systems have become an important tool for alleviating the information that overload problem of users by providing personalized recommendations to a user who might like based on past preferences or observed behavior about one or various items. In these systems...
متن کاملA NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM
Memory-based collaborative filtering is the most popular approach to build recommender systems. Despite its success in many applications, it still suffers from several major limitations, including data sparsity. Sparse data affect the quality of the user similarity measurement and consequently the quality of the recommender system. In this paper, we propose a novel user similarity measure based...
متن کاملRapid Development of Knowledge-Based Conversational Recommender Applications with Advisor Suite
Knowledge-based recommender systems are Web-based applications that exploit deep domain knowledge for generating buying proposals that match the individual needs and requirements of an online user. As in many domains the detailed customer requirements have to be elicited in an interactive dialog before the recommendation can be made, the development and in particular also the maintenance of the...
متن کاملA Comparative Study of Compound Critique Generation in Conversational Recommender Systems
Critiquing techniques provide an easy way for users to feedback their preferences over one or several attributes of the products in a conversational recommender system. While unit critiques only allow users to critique one attribute of the products each time, a well-generated set of compound critiques enables users to input their preferences on several attributes at the same time, and can poten...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017